Skip to content

Release: round-13 batch (10 features, v161-v170)#383

Merged
JE-Chen merged 21 commits into
mainfrom
dev
Jun 23, 2026
Merged

Release: round-13 batch (10 features, v161-v170)#383
JE-Chen merged 21 commits into
mainfrom
dev

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 23, 2026

Copy link
Copy Markdown
Member

Release — round-13 vision / OCR-layout / agent-loop batch

Ships 10 net-new features (#373#382, docs v161–v170) to main, all merged to dev CI-green (SonarCloud quality gate + Codacy issues=0 + GitHub Actions matrices + Docker headless image). Each ships the full 5-layer surface (headless core → facade → AC_* executor → MCP tool → Script Builder) + headless unit test + EN/Zh docs + changelog.

Vision matching robustness

  • match_trust (v161) — second-peak ratio + PSR ambiguity scoring; extracted shared visual_match._score_map.
  • match_autothresh (v164) — Otsu on the score map (no hand-tuned min_score).
  • edge_match (v168) — Chamfer / distance-transform matching for low-texture / re-themed glyphs.

OCR / layout intelligence

  • table_grid_fill (v162) — fuse find_grid lines + OCR boxes → addressable table.
  • column_layout (v165) — whitespace-projection columns for borderless tables.
  • form_fields (v166) — multi-direction label↔value association + checkbox state.

Computer-use agent loop

  • observation_delta (v163) — token-budgeted "what changed since last step".
  • action_effect (v167) — classify did-my-click-do-anything with target-local attribution.
  • postcondition (v169) — declarative expected-outcome specs diffed vs before.
  • step_repair (v170) — repair-tactic policy + bounded retry loop (completes the action_effect + postcondition + step_repair self-correction trio).

Merge with --merge (no branch delete; dev stays the working branch).

JE-Chen added 21 commits June 24, 2026 02:36
match_template returns only the top score and clicks it, but a duplicate
toolbar button or a near-identical sibling correlates ~0.95 in two places, so
a high score is not an unambiguous match. Add a Lowe-style ratio test for
pixel templates: inspect the full correlation surface, compare the global peak
to the next-best peak outside an exclusion window, compute PSR, and flag
strong-but-ambiguous matches. Reuses a new visual_match._score_map.
…batch

Add trust-scored template matching (ambiguity / PSR)
edge_lines.find_grid recovers a bordered table's geometry but leaves the cells
empty; OCR gives the text but no structure, and nothing joined them. Drop OCR
boxes into the grid (assigned by cell-centre, gated by an overlap fraction),
concatenate each cell's text in reading order, flag merged-cell spans, and
convert to records / CSV. Pure-stdlib over plain dicts.
…ill-batch

Add table_grid_fill: fill a ruling-line grid with OCR text
serialize_observation renders one full frame (blows the token budget every
turn); element_diff gives the stable-ID correspondence but stops at element
pairs. Add the missing serializer: diff two frames, classify matched elements
as changed or stable, render only the churn as +/~/- lines (added & changed
first, stable dropped, capped at max_lines). Reuses element_diff.match_elements
and observation.observation_index.
…delta-batch

Add token-budgeted observation delta (what changed)
Every match_template_all call forces a hand-tuned min_score: too low floods
NMS, too high drops re-themed targets, and the right value differs per asset.
Run Otsu on the correlation score histogram to find the valley between
background correlation and real matches, returning that cut-off plus a
separability score so a unimodal (no-match) surface is flagged. match_auto
returns one peak per above-threshold region via connected_boxes.
ccoeff_normed scores span [-1, 1], so the Otsu cut-off can legitimately be
negative on some OpenCV builds (match_auto clamps it with floor anyway). Assert
the threshold is below a perfect match and use a relative separability check
(bimodal > flat) instead of absolute bounds.
…resh-batch

Add auto-thresholded template matching (Otsu on score map)
ocr/structure detects a table only when every row's cell-left-x matches, so it
fails on ragged / borderless / right-aligned columns; edge_lines.find_grid
needs ruling lines a whitespace table has none of. Find columns by the gaps:
project OCR boxes onto the x-axis, read the persistent empty vertical bands as
gutters, assign column indices, bucket rows by spacing, emit the table. Pure
difference-array projection, no numpy.
…t-batch

Add column_layout: infer columns from whitespace (borderless tables)
…tate

ocr/structure only pairs a label: with the immediately next cell, so it can't
handle label-above-value, two-column key/value, right-aligned values or
non-text widgets, and has no checkbox notion. Pair each label with the nearest
aligned value across directions (right/below) within max_gap, match
free-standing widgets to their nearest label, and read checkbox state from the
box's dark-pixel fill ratio.
…batch

Add form_fields: multi-direction label/value association + checkbox state
screen_state/element_diff report what changed but never tie it to the action;
loop_guard only flags a no-op after the same digest repeats N times. Diff the
before/after observation and, given the action's target point, classify the
result on the first step as no_op / changed_near_target / changed_elsewhere /
changed, returning the changed centres and a reason. Reuses
element_diff.match_elements and observation_delta's field-change check.
…t-batch

Add action_effect: classify did-my-click-do-anything with attribution
Intensity NCC drops when a control is re-filled or re-themed, and ORB needs
corner texture flat-design glyphs lack. Match by edge shape instead: Canny both
images, distance-transform the scene edges, slide the template's edges over it
and score by mean edge-to-edge distance (Chamfer). A perfect outline aligns at
~0 cost regardless of fill. Reuses visual_match's loaders/resize/NMS/Match and
edge_lines's Canny default.
…atch

Add edge_match: edge-shape (Chamfer) template matching
expect_poll/assert_eventually poll a single condition with no action-bound spec
and no before-baseline, so they can't express 'a new dialog appeared';
trajectory_eval is whole-trajectory. Evaluate a small JSON spec of clauses
(appears/disappears/enabled/disabled/text_present/text_absent/count) against
the after-observation, diffed against the before-observation, returning a
per-clause pass/fail report. compile_postcondition yields an after->bool
predicate for expect_poll.
…n-batch

Add postcondition: declarative expected-outcome specs for actions
self_healing/locator_repair only fix a locator that didn't resolve; loop_guard
only detects a stuck loop with no tactic selection or backoff. Consume an
effect verdict and drive a bounded retry loop, choosing the next untried tactic
(wait_retry/relocate/nudge/scroll_into_view/escalate) each round. Pure-stdlib
state machine with injected act/verify/apply_tactic/verdict_for/sleep seams.
…batch

Add step_repair: repair-tactic policy for failed / no-effect actions
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 427 complexity · 10 duplication

Metric Results
Complexity 427
Duplication 10

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit 4b5fd63 into main Jun 23, 2026
31 checks passed
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant